Stragglers are commonly believed to have a great impact on the performance ofbig data system. However, the reason to cause straggler is complicated.Previous works mostly focus on straggler detection, schedule level optimizationand coarse-grained cause analysis. These methods cannot provide valuableinsights to help users optimize their programs. In this paper, we proposeBigRoots, a general method incorporating both framework and system features forroot-cause analysis of stragglers in big data system. BigRoots considersfeatures from big data framework such as shuffle read/write bytes and JVMgarbage collection time, as well as system resource utilization such as CPU,I/O and network, which is able to detect both internal and external root causesof stragglers. We verify BigRoots by injecting high resource utilization acrossdifferent system components and perform case studies to analyze differentworkloads in Hibench. The experimental results demonstrate that BigRoots iseffective to identify the root cause of stragglers and provide useful guidancefor performance optimization.
展开▼